Single pass graph sparsification in distributed stream processing

نویسندگان

  • Ashish Goel
  • Michael Kapralov
  • Olga Kapralova
  • Sanjeev Khanna
چکیده

We give a distributed one pass streaming algorithm for graph sparsification. Besides producing a sparsifier, our algorithm maintains a hierarchy of UNION-FIND data structures in a distributed manner that efficiently support queries of strong connectivities between pairs of vertices. An important component of the algorithm is an implementation of UNION-FIND queries over an Active Distributed Hash Table that guarantees good load balancing properties. This is achieved via a single step of what is known in the literature as the zig-zag heuristic. We provide theoretical guarantees for the load balancing achieved by this heuristic, and show how the structure of our sparsification scheme ensures good load balancing across the hierarchy of UNION-FIND data structures maintained by the algorithm. We also present simulation results on synthetic as well as real world data verifying the load balancing properties and the quality of approximation of strong connectivities achieved by the algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single pass sparsification in the streaming model with edge deletions

In this paper we give a construction of cut sparsifiers of Benczúr and Karger in the dynamic streaming setting in a single pass over the data stream. Previous constructions either required multiple passes or were unable to handle edge deletions. We use Õ(1/ǫ) time for each stream update and Õ(n/ǫ) time to construct a sparsifier. Our ǫ-sparsifiers have O(n log n/ǫ) edges. The main tools behind o...

متن کامل

Sparsification Algorithm for Cut Problems on Semi-streaming Model

The emergence of social networks and other interaction networks have brought to fore the questions of processing massive graphs. The (semi) streaming model, where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges) is an useful and efficient model for processing large graphs. In many of these graphs the numbers of vertices are significantly less t...

متن کامل

Graph Sparsification in the Semi-streaming Model

Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of ...

متن کامل

Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...

متن کامل

Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011